How to Truly Understand How LLMs Work — From Scratch
If you’re genuinely interested in how ChatGPT or any other large language model (LLM) is trained end-to-end, the best way is to read and debug the code locally.
I prefer a step-by-step debugging approach — comparing actual code with the algorithms described in research papers. You’ll gain much more hands-on, practical understanding this way than just by reading papers.
A great starting point is: 👉 karpathy/nanochat (Python) — a minimal chat model built from scratch that’s easy to understand.
For Those Who Love Low-Level Details
If you’re familiar with C/C++, you can dive even deeper into the implementation level:
These repositories reveal how each component of an LLM works under the hood — from tokenization to attention to optimization. Once you grasp these details, you can even optimize C/C++ computation for specific hardware, improving efficiency or adapting for embedded or edge devices.
Fine-Tuning Existing Models
If you want to fine-tune open-source models, use the Hugging Face Transformers framework. It supports models like Qwen, Gemma, and many others.
You can fine-tune them on your own domain data to improve performance for niche or specialized applications — such as legal text, finance, healthcare, or any vertical where data specificity matters.
Hardware Isn’t a Barrier
You don’t need high-end GPUs to start learning. A consumer GPU with ~10GB VRAM is enough. Just tune the parameters (model size, batch size, sequence length, etc.) until your hardware can handle it. The key is to learn by experimentation, not just by reading.
My Learning Project
I’ve also forked and started my own experimental repo: 👉 tiny-llm-learn
My approach:
Read → Debug → Modify → Test → Build
Through this cycle, you’ll be able to create your own small, niche, and practical LLMs tailored for specific applications — without needing a data center.
💡 Takeaway: Learning LLMs is not just about understanding the math — it’s about connecting theory with code. Start small, explore the internals, and gradually build up to models that solve real problems.